inference scheme
Appendices
In Equation 4, maximization is carried out over the inputy to the inverse-map, and the input z which is captured inหp in the above optimization problem, i.e. maximization overz in Equation 4 is equivalent to choosingหp subject to the choice of singleton/ Dirac-deltaหp. Since Equation 4 describes a constrained optimization problem, our approach towards solving this problem in practice is via dual gradient descent. Gradient descent is used to optimize the Lagrangian of Equation 4 (with the constraintp(z) 2 modified to belogp(z) 2 as it is easy to uselogp(z)numerically for stochasticoptimization),showninEquation5. Ateachiteration,itsamplesafunction from this distribution and queries the pointx?t that greedily minimizes this function. Information Ratio Russo and Van Roy[30] related the expected regret of TS to its expected information gain i.e. the expected reduction in the entropy of the posterior distribution ofX .
Analysis of Brain States from Multi-Region LFP Time-Series
Kyle R. Ulrich, David E. Carlson, Wenzhao Lian, Jana S. Borg, Kafui Dzirasa, Lawrence Carin
The local field potential (LFP) is a source of information about the broad patterns of brain activity, and the frequencies present in these time-series measurements are often highly correlated between regions. It is believed that these regions may jointly constitute a "brain state," relating to cognition and behavior. An infinite hidden Markov model (iHMM) is proposed to model the evolution of brain states, based on electrophysiological LFP data measured at multiple brain regions. A brain state influences the spectral content of each region in the measured LFP . A new state-dependent tensor factorization is employed across brain regions, and the spectral properties of the LFPs are characterized in terms of Gaussian processes (GPs). The LFPs are modeled as a mixture of GPs, with state-and region-dependent mixture weights, and with the spectral content of the data encoded in GP spectral mixture covariance kernels. The model is able to estimate the number of brain states and the number of mixture components in the mixture of GPs. A new variational Bayesian split-merge algorithm is employed for inference. The model infers state changes as a function of external covariates in two novel elec-trophysiological datasets, using LFP data recorded simultaneously from multiple brain regions in mice; the results are validated and interpreted by subject-matter experts.
Uncovering Scaling Laws for Large Language Models via Inverse Problems
Verma, Arun, Wu, Zhaoxuan, Zhou, Zijian, Lin, Xiaoqiang, Chen, Zhiliang, Sim, Rachael Hwee Ling, Qiao, Rui, Wang, Jingtan, Bui, Nhung, Niu, Xinyuan, Hu, Wenyang, Lau, Gregory Kang Ruey, Khoo, Zi-Yu, Zhao, Zitong, Xu, Xinyi, Hemachandra, Apivich, Ng, See-Kiong, Low, Bryan Kian Hsiang
Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems in uncovering fundamental scientific laws, this position paper advocates that inverse problems can also efficiently uncover scaling laws that guide the building of LLMs to achieve the desirable performance with significantly better cost-effectiveness.
Near Optimal Inference for the Best-Performing Algorithm
Consider a collection of competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to rank highest on a future, unseen dataset. A natural approach is to select the algorithm that demonstrates the best performance on the benchmark. However, in many cases the performance differences are marginal and additional candidates may also be considered. This problem is formulated as subset selection for multinomial distributions. Formally, given a sample from a countable alphabet, our goal is to identify a minimal subset of symbols that includes the most frequent symbol in the population with high confidence. In this work, we introduce a novel framework for the subset selection problem. We provide both asymptotic and finite-sample schemes that significantly improve upon currently known methods. In addition, we provide matching lower bounds, demonstrating the favorable performance of our proposed schemes.
A Class Inference Scheme With Dempster-Shafer Theory for Learning Fuzzy-Classifier Systems
Shiraishi, Hiroki, Ishibuchi, Hisao, Nakata, Masaya
The decision-making process significantly influences the predictions of machine learning models. This is especially important in rule-based systems such as Learning Fuzzy-Classifier Systems (LFCSs) where the selection and application of rules directly determine prediction accuracy and reliability. LFCSs combine evolutionary algorithms with supervised learning to optimize fuzzy classification rules, offering enhanced interpretability and robustness. Despite these advantages, research on improving decision-making mechanisms (i.e., class inference schemes) in LFCSs remains limited. Most LFCSs use voting-based or single-winner-based inference schemes. These schemes rely on classification performance on training data and may not perform well on unseen data, risking overfitting. To address these limitations, this article introduces a novel class inference scheme for LFCSs based on the Dempster-Shafer Theory of Evidence (DS theory). The proposed scheme handles uncertainty well. By using the DS theory, the scheme calculates belief masses (i.e., measures of belief) for each specific class and the ``I don't know'' state from each fuzzy rule and infers a class from these belief masses. Unlike the conventional schemes, the proposed scheme also considers the ``I don't know'' state that reflects uncertainty, thereby improving the transparency and reliability of LFCSs. Applied to a variant of LFCS (i.e., Fuzzy-UCS), the proposed scheme demonstrates statistically significant improvements in terms of test macro F1 scores across 30 real-world datasets compared to conventional voting-based and single-winner-based fuzzy inference schemes. It forms smoother decision boundaries, provides reliable confidence measures, and enhances the robustness and generalizability of LFCSs in real-world applications. Our implementation is available at https://github.com/YNU-NakataLab/jUCS.
Analysis of Brain States from Multi-Region LFP Time-Series
Kyle R. Ulrich, David E. Carlson, Wenzhao Lian, Jana S. Borg, Kafui Dzirasa, Lawrence Carin
The local field potential (LFP) is a source of information about the broad patterns of brain activity, and the frequencies present in these time-series measurements are often highly correlated between regions. It is believed that these regions may jointly constitute a "brain state," relating to cognition and behavior. An infinite hidden Markov model (iHMM) is proposed to model the evolution of brain states, based on electrophysiological LFP data measured at multiple brain regions. A brain state influences the spectral content of each region in the measured LFP. A new state-dependent tensor factorization is employed across brain regions, and the spectral properties of the LFPs are characterized in terms of Gaussian processes (GPs). The LFPs are modeled as a mixture of GPs, with state-and regiondependent mixture weights, and with the spectral content of the data encoded in GP spectral mixture covariance kernels. The model is able to estimate the number of brain states and the number of mixture components in the mixture of GPs. A new variational Bayesian split-merge algorithm is employed for inference. The model infers state changes as a function of external covariates in two novel electrophysiological datasets, using LFP data recorded simultaneously from multiple brain regions in mice; the results are validated and interpreted by subject-matter experts.
Reviews: Variational Bayes on Monte Carlo Steroids
This paper provides theoretical bounds that are tighter than existing variational bounds for the problem of learning latent variable models. The authors extend applied existing theory of hash-based learning and amortized inference to design a black-box learning algorithm. They later applied it to learning a Sigmoid Belief Network. The main advantage to this approach seems to be the partitioning of the search space for posterior distributions into buckets/subsets that are faster to search than with a typical sampling method. The proposed inference scheme then leverages mean-field inference (used heavily in the context of variational inference) within each subset. One of the main technical contributions is the tighter bound on the likelihood using two aggregate estimators which was an extension of an existing work (specific to undirected graphical models) to the directed models setting.
Adaptive posterior distributions for uncertainty analysis of covariance matrices in Bayesian inversion problems for multioutput signals
Curbelo, E., Martino, L., Llorente, F., Delgado-Gomez, D.
In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We propose an adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inference takes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the first part of the proposed inference scheme, a novel AIS technique called adaptive target adaptive importance sampling (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the non-linear model and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted to obtain a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches
Reviews: Dirichlet belief networks for topic structure learning
This submission proposes a new prior on the topic-word distribution in latent topic models. This model defines a multi-layer feedforward graph, where each layer contains a set of valid multinomial distributions over the vocabulary, and weighted combinations of each layer's "topics" are used as the Dirichlet prior for the "topics" of the next layer. The key purported benefits are sharing of statistical strengh, inference of a hierarchy of interpretable "abstract" topics, and modularity that allows composition with other topic model variants that modify the document-topic distributions. The authors present an efficient fully collapsed Gibbs sampler inference scheme - I did not thoroughly check the derivation but it seems plausible. Although: what is the computational complexity (and relative "wall clock" cost) of the given inference scheme?